Here, I’m going to make a visual representation of the number of papers published in the field of medecine. Let’s interrogate Pubmed, and look at the publication trend of machine learning scientific papers published in pubmed database. It seems that the 1st ever neural network was made in 1957 (https://www.forbes.com/sites/bernardmarr/2016/02/19/a-short-history-of-machine-learning-every-manager-should-read/#2b96042815e7)

library(rentrez)
library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
search_year <- function(year, term){
  query <- paste(term, "AND (", year, "[PDAT])")
  entrez_search(db="pubmed", term=query, retmax=0)$count
}

year <- 1957:2014

Let’s create a vector named papers

papers <- sapply(year, search_year, term="machine learning", USE.NAMES=FALSE)

Let’s build a dataframe containing the number of papers and year of publication

d = data.frame(year=year, number_of_papers=papers);d
##    year number_of_papers
## 1  1957                1
## 2  1958                0
## 3  1959                0
## 4  1960                0
## 5  1961                0
## 6  1962                1
## 7  1963                0
## 8  1964                4
## 9  1965                1
## 10 1966                2
## 11 1967                1
## 12 1968                2
## 13 1969                2
## 14 1970                2
## 15 1971                0
## 16 1972                2
## 17 1973                1
## 18 1974                2
## 19 1975                3
## 20 1976                3
## 21 1977                1
## 22 1978                2
## 23 1979                2
## 24 1980                4
## 25 1981                1
## 26 1982                4
## 27 1983                3
## 28 1984                3
## 29 1985                5
## 30 1986                6
## 31 1987                3
## 32 1988                7
## 33 1989                8
## 34 1990                7
## 35 1991               10
## 36 1992               15
## 37 1993               22
## 38 1994               24
## 39 1995               32
## 40 1996               30
## 41 1997               34
## 42 1998               31
## 43 1999               35
## 44 2000               58
## 45 2001               82
## 46 2002               98
## 47 2003              124
## 48 2004              187
## 49 2005              248
## 50 2006              324
## 51 2007              414
## 52 2008              508
## 53 2009              600
## 54 2010              716
## 55 2011             1147
## 56 2012             1496
## 57 2013             1930
## 58 2014             2422

Now, let’s make an interactive plot displaying the number of papers and year of publication.

plot_ly(x=~year, y=~number_of_papers, data=d, type="scatter", mode="marker", marker = list(color = "blue"))  %>%  
  layout(title = "The rise of machine learning in biomedical sciences and life sciences", 
    annotations = list(x = 1, y = -0.1, text = "based on data from NCBI/PUBMED", 
      showarrow = F, xref='paper', yref='paper',xanchor='right', yanchor='auto', xshift=0, yshift=0, font=list(size=10, color="blue"),xaxis = "x", yaxis = "y")
 )
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...

There is a sharp increase of the number of publications from the year 2000. To better visualize that, lets build another plot

plot_ly(x=~year, y=~number_of_papers, data=d[34:58, ], type="scatter", mode="marker", marker = list(color = "blue"), linetype = I("dash"))  %>%  
  layout(title = "The rise of machine learning in biomedical sciences and life sciences (1990-2017)", 
    annotations = list(x = 1, y = -0.1, text = "based on data from NCBI/PUBMED", 
      showarrow = F, xref='paper', yref='paper',xanchor='right', yanchor='auto', xshift=0, yshift=0, font=list(size=10, color="blue"),xaxis = "x", yaxis = "y")
 )
## Adding lines to mode; otherwise linetype would have no effect.
## A marker object has been specified, but markers is not in the mode
## Adding markers to the mode...

When I was trying to discuss this result, I realized Hashem et al, 2017 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5760972/) found a quite similar trend, though he used a different R package (RISmed package) and plotted the proportion of papers per million. I think that the linear increase between 2005-2010 may be explained by the increased use of 3rd generation sequencing methods, which gave rise to big genomic data. Analysing such big data often requires dimension reduction and other machine learning methods. Linear regression models were found to be the most dominant machine learning techniques in the life sciences over the past three decades.

In conclusion, there are more and more health researchers who are using machine learning algorithms. I can’t wait to see the impact deep learning algorithms in life science and medicine!